Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting Wrappers: Occupancy Histogram #403

Merged
merged 24 commits into from
Feb 18, 2025

Conversation

willGraham01
Copy link
Contributor

@willGraham01 willGraham01 commented Feb 4, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

See #388 and related, #5 (which is actually also closed)

What does this PR do?

Adds the plot_occupancy function to the movement.plots module. This function takes in (time, space [x, y])-data and produces a histogram showing the distribution of positions across all time-points.

By default, any additional axes in the input da (DataArray) are collapsed onto the 0th-index, to provide the expected 2D data input. The selection argument can be used by the user to specify alternative coordinates along non-spacetime dimensions to collapse onto instead.

plot_occupancy returns the usual figure and axes objects, however also returns information from the plotted histogram as its third value. This is mainly because this information is difficult to re-extract from the returned axes figure. The counts information in particular would technically otherwise be lost since QuadMesh objects (that store histograms) only retain the colour-mapped values (which may blur across bins with similar, but distinct counts), and not the raw counts in each bin.

References

Closes #388. Additionally, this hopefully goes some way towards addressing #5, since we are returning the histogram data as the 3rd return value.
Closes #5 too.

How has this PR been tested?

Addition of tests to cover expected functionality, and possible edge cases.

Is this a breaking change?

No

Does this PR require an update to the documentation?

#410

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@willGraham01 willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 3fe51e8 to 2cc312a Compare February 4, 2025 13:05

This comment was marked as resolved.

@willGraham01 willGraham01 changed the title Plot wrapper for occupancy histogram Plotting Wrappers: Occupancy Histogram Feb 4, 2025
@willGraham01 willGraham01 linked an issue Feb 4, 2025 that may be closed by this pull request
@willGraham01 willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from 415f620 to e412849 Compare February 11, 2025 11:13
@willGraham01 willGraham01 marked this pull request as ready for review February 11, 2025 11:14
@willGraham01
Copy link
Contributor Author

@sfmig your comment on #5 indicates that it would be useful to have certain bits of information from the plot as outputs from this kind of function. Currently I'm just exposing the other hist2d outputs (that are suppressed by the wrapper otherwise) to the user here, not sure if you had more detailed outputs in mind when writing your comment.

But if so, we can also close #5 with this PR too.

@sfmig
Copy link
Contributor

sfmig commented Feb 11, 2025

thanks for checking @willGraham01 !

The point of that comment was that often you not only want the figure, but also the 2D array with the bin counts. From your comment ...

Currently I'm just exposing the other hist2d outputs

seems like that is covered? So I think we can close #5 yay 😄 🚀

(Just fyi I vaguely remember this was something Sepi requested but not sure)

@willGraham01
Copy link
Contributor Author

willGraham01 commented Feb 11, 2025

(Just fyi I vaguely remember this was something Sepi requested but not sure)

I hope it is b/c otherwise I've just wasted 5 hours of Niko's grant 🤭 😂 But will mark #5 as closable by this 🥳

@niksirbi
Copy link
Member

I will finish reviewing this tomorrow, but I can already do some cool things with this!

image

See source code for this figure
import numpy as np
from matplotlib import pyplot as plt

from movement import sample_data
from movement.plots import plot_occupancy

# Load the sample dataset 
ds = sample_data.fetch_dataset("DLC_two-mice.predictions.csv")

# Compute the centroid of all keypoints
centroid_position = ds.position.mean("keypoints")

image = plt.imread(ds.attrs["frame_path"])

# Construct bins of size 20x20 pixels that cover the entire image
bin_pix = 30
bins = [
    np.arange(0, image.shape[0] + bin_pix, bin_pix),
    np.arange(0, image.shape[1] + bin_pix, bin_pix),
]

# Initialize the figure and axis
fig, ax = plt.subplots()

# Show the image
ax.imshow(image)

# Plot the occupancy 2D histogram for each individual
_, _, hist_data = plot_occupancy(
    da=centroid_position,
    selection={"individuals": "individual1"},
    ax=ax,
    cmap="viridis",
    alpha=0.5,
    bins=bins,
    cmin=3,      # Set the minimum shown count
    norm="log"
)

# Set the axis limits to match the image
ax.set_xlim(0, image.shape[1])
ax.set_ylim(image.shape[0], 0)

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willGraham01!

I’ve added some comments, mostly about aligning the function signature (and default behavior) with that of plot_trajectory().

Regarding your discussion with Sofía:
Yes, this approach technically meets the requirement of also obtaining the occupancy data as a 2D array, which is excellent. However, it can be slightly awkward to always rely on the plotting function when all you need is the occupancy array. There may be scenarios where the user only wants the 2D occupancy array—without the plot—for comparisons with neural data. From that perspective, it might be more intuitive to have a dedicated compute_occupancy function that returns both the 2D array and the bin edges. We could discuss the best data structure to return—whether that’s an xr.DataArray or multiple NumPy arrays, similar to hist2d.

In any case, I suggest merging this PR with just plot_occupancy (after addressing my comments) and leaving compute_occupancy for a future PR. We just need to ensure that both functions produce consistent histogram data, i.e. compute_occupancy should use the same underlying method as hist2d.

movement/plots/__init__.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
@willGraham01 willGraham01 force-pushed the wgraham-388-occupancy-histogram branch from e412849 to 8ecd914 Compare February 14, 2025 14:45
@willGraham01
Copy link
Contributor Author

Function signature is now in line with plot trajectory, and I've implemented the default behaviour of "take centroid of keypoints, then aggregate over individuals".

Note that the aggregation over individuals works by "stacking" the input DataArrays "individual" axis along the "time" axis. EG a (10, 2, 4) time-space-individuals array can be reshaped and counted as a (40, 2) array. This:

  • Makes it robust against NaN values (NaN values can be filtered on a per-position basis, since xarray.dropna currently doesn't support dropping NaNs along multiple axes simultaneously).
  • Prevents any issues with bin sizes not aligning. If we count each individual first, then sum the counts, providing bins=[30,20] will give non-aligned bin edges. By stacking first, we avoid this issue (note that passing explicit bin edges works in both cases, regardless).

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willGraham01! I like your approach to aggregating, it's entirely sensible.

I left some final comments to be addressed. I'll pre-approve this PR so you have the freedom to merge it as soon as you're done addressing those comments.

movement/plots/occupancy.py Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
if "individuals" in data.dims:
data = data.stack(
{"new": ("time", "individuals")}, create_index=False
).rename({"new": "time"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases I get the following warning here:

/Users/nsirmpilatze/Code/NIU/movement/movement/plots/occupancy.py:157: UserWarning: rename 'new' to 'time' does not create an index anymore. Try using swap_dims instead or use set_index after rename to create an indexed coordinate.
  ).rename({"new": "time"})

movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved
movement/plots/occupancy.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

@willGraham01 willGraham01 added this pull request to the merge queue Feb 18, 2025
Merged via the queue into main with commit 8fbc991 Feb 18, 2025
28 checks passed
@willGraham01 willGraham01 deleted the wgraham-388-occupancy-histogram branch February 18, 2025 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotting wrappers: Occupancy Heatmap Plot occupancy heatmap
4 participants